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Abstract — Recent increase in energy prices has led researchers 
to find better ways for capacity provisioning in data centers to 
reduce energy wastage due to the variation in workload. This 
paper explores the opportunity for cost saving and proposes a 
novel approach for capacity provisioning under bounded latency 
requirements for the workload. We investigate how many servers 
to be kept active and how much workload to be delayed for energy 
saving while meeting every deadline. We present an offline LP 
formulation for capacity provisioning by dynamic deferral and 
give two online algorithms to determine the capacity of the data 
center and the assignment of workload to servers dynamically. We 
prove the feasibility of the online algorithms and show that their 
worst case performance are bounded by a constant factor with 
respect to the offline formulation. We validate our algorithms on 
synthetic workload generated from two real HTTP traces and 
show that they actually perform much better in practice than 
the worst case, resulting in 20-40% cost-savings. 

I. Introduction 

With the advent of cloud computing, data centers are emerg- 
ing all over the world and their energy consumption becomes 
significant; as estimated 61 million MWh, ~1.5% of US 
electricity consumption, costing about 4.5 billion dollars 
Naturally, energy efficiency in data centers has been pursued 
in various ways including the use of renewable energy O, O 
and improved cooling efficiency |4|, |5 1, |6|, etc. Among them, 
improved scheduling algorithm is a promising approach for 
its broad applicability regardless of hardware configurations. 
While there are a number of work in this approach as well 
(e.g., Q), one non-conventional perspective is to optimize 
the schedule such that certain performance metric satisfies a 
predetermined requirement, which is normally defined in the 
form of service level agreements (SLAs). Specifically, latency 
is an important performance metric for any web-based services 
and is of great interests for service providers who run their 
services on data centers. 

In this paper, we are interested in minimizing the en- 
ergy consumption of data center under guarantees on la- 
tency/deadline. We use the deadline information to defer 
some tasks so that we can reduce the total cost for energy 
consumption for executing the workload and switching the 
state of the servers. We determine the portion of the released 
workload to be executed at the current time and the portions to 
be deferred to be executed at later time slots without violating 



deadline. Our approach is similar to 'valley filling' that is 
widely used in data centers to utilize server capacity during 
the periods of low loads |7|. But the load that is used for 
valley filling is mostly background/maintenance tasks (e.g. 
web indexing, data backup) which is different from actual 
workload. In fact current valley filling approaches ignore 
the workload characteristics for capacity provisioning. In this 
paper, we determine how much work to store for valley filling 
in order to reduce the current and future energy consumption. 
Later we generalize our approach for more general workload 
where different workload have different deadline. 

The contribution of this paper is twofold. First, we present 
an LP formulation for capacity provisioning with dynamic 
deferral of workload. The formulation not only determines 
capacity but also determines the assignment of workload for 
each time slot. As a result the utilization of each server can be 
determined easily and resource can be allocated accordingly. 
Therefore this method well adapts to other scheduling policies 
that take into account dynamic resource allocation, priority 
aware scheduling, etc. 

Second, we design two optimization based online algorithms 
depending on the nature of the deadline. For uniform deadline, 
our algorithm named Valley Filling with Workload (VFW(5)), 
looks ahead 5 slots to optimize the total energy consumption. 
The algorithm uses the valley filling approach to accumulate 
some workload to execute in the periods of low loads. For 
nonuniform deadline, we design a Generalized Capacity Pro- 
visioning (GCP) algorithm that reduces the switching (on/off) 
of servers by balancing the workloads in adjacent time slots 
and thus reduces energy consumption. We prove the feasibility 
of the solutions and show that the performance of the online 
algorithms are bounded by a constant factor with respect to the 
offline formulation in worst case. Since for the proof we do 
not presume anything about the workload, the performance 
of both the algorithms are much better in practice than the 
worst case, as shown by experiments. We used HTTP traces 
as examples for dynamic workload and found more than 40% 
total cost saving for GCP and around 20% total cost saving for 
Y¥W(S) even for small deadline requirements. We compared 
the two online algorithms with different parameter settings 
and found that GCP gives more cost savings than VFW(J) for 
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typical workload but for bursty workload, VFW((5) sometimes 
performs better than GCR 

The rest of the paper is organized as follows. Section II 
presents the model that we use to formulate the optimization 
and gives the offline formulation. In section III, we present 
the YFW(d) algorithm for determining capacity and workload 
assignment dynamically when the deadline is uniform. In 
section IV, we illustrate the GCP algorithm with nonuniform 
deadline. Section V shows the experimental results. In section 
VI, we describe the state of the art research related to capacity 
provisioning and section VII concludes the paper. 

II. Model Formulation 

In this section, we describe the model we use for capacity 
provisioning via dynamic deferral. The assumptions used in 
this model are minimal and this formulation captures many 
properties of current data center capacity and workload char- 
acteristics. 

A. Workload Model 

We consider a workload model where the total workload 
varies over time. The time interval we are interested in is 
t G {0, 1, . . . , T} where T can be arbitrarily large. In practice, 
T can be a year and the length of a time slot r could be as 
small as 2 minutes (the minimum time required to change 
power state of a server). In our model, the jobs have length 
less than r and each job has deadline D associated with it 
within which it needs to be executed. If the length of a job 
is greater than r then we can safely decompose it into small 
pieces (< r) each of which has deadline D. Hence we do 
not distinguish each job, rather deal with the total amount of 
workload. For now, assume that the deadline is uniform for 
all the workload and the non-uniform case is considered in 
section IV. Let Lt be the amount of workload released at time 
slot t. This amount of work must be executed by the end of 
time slot t -\- D. Since Lt varies over time, we often refer to 
it as a workload curve. 

In our model, we consider a data center as a collection of 
homogeneous servers. The total number of servers M is fixed 
and given but each server can be turned on/off to execute the 
workload. We normalize Lt by the processing capability of 
each server i.e. Lt denotes the number of servers required to 
execute the workload at time t. We assume for all t, Lt < M. 
Let Xi^d,t be the portion of the released workload Lt that is 
assigned to be executed at server i at time slot t -\- d where 
< d < D. Let mt be the number of active servers during 
time slot t. Then 

mt D 

Xi4,t = Lt and < Xi^d,t < 1 

i=l d=0 

Let Xi^t be the total workload assigned at time t to server i 
and Xt be the total assignment at time t. Then we can think 
of Xi^t as the utilization of the ith server at time t i.e. < 
Xi t < I. Thus 



D 

d=0 



Xi^t and 



mt 



i=l 



From the data center perspective, we focus on two important 
decisions during each time slot t: (i) determining mt, the num- 
ber of active servers, and (ii) determining Xi^d,t^ assignment 
of workload to the servers. 

B. Cost Model 

The goal of this paper is to minimize the cost (price) of 
energy consumption in data centers. The energy cost function 
consists of two parts: operating cost and switching cost. 
Operating cost is the cost for executing the workload which 
in our model is proportional to the assigned workload. We 
use the common model for the energy cost for typical servers 
which is an affine function: 

C{x) = eo + eix 

where eo and ei are constants (e.g. see fSl) and x is the 
assigned workload (utilization) of a server at a time slot. 

Switching cost (3 is the cost incurred for changing state 
(on/off) of a server. We consider the cost of both turning on 
and turning off a server. Switching cost at time t is defined as 
follows: 

St P\mt - mt-i\ 
where /3 is a constant (e.g. see (Tl, O). 

C. Optimization Problem 

Given the models above, the goal of a data center is to 
choose the number of active servers (capacity) mt and the 
dispatching rule Xi^d,t to minimize the total cost during [1, T], 
which is captured by the following optimization: 



T mt 



mm 



t=i i=i t=i 

mt D 

subject to ^ ^ Xi^d,t = Lt 

i=l d=0 
mt D 



(1) 



1=1 d^O 
D 

d=0 

0<mt<M 
Xidt > 



yi,yd,yt. 



Xi,t Xt 



Since the servers are identical, we can simplify the problem 
by dropping the index i for x. More specifically, for any 
feasible solution Xi^d,t^ we can make another solution by 
Xi4,t = YT=i ^i4,t/mt (i.e., replacing every Xi^d,t by the 
average of Xi^d,t for all i) without changing the value of the 
objective function while satisfying all the constraints after 
this conversion. Then we have the following optimization 
equivalent to ([T]): 
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mtC{xt/mt) + /3 ^ |mt - mt-i\ (2) 

D 

subject to ^ Xd^t = Lt 



t=i 



D 

^Xd^t-d < mt 

0<mt<M 
Xd,t > 



where x^i^t represents the portion of the workload Lt to 
be executed at a server at time t -\- d. We further simpHfy 
the problem by showing that any optimal assignment for 
^ can be converted to an equivalent assignment that uses 
earliest deadline first (EDF) policy. More formally, we have 
the following lemma: 

Lemma 1: Let x^^ and x^^ be the optimal assignments of 
workload obtained from the solution of optimization ^ at 
times tr and ts respectively where ts > U and ts — U = < 

D. If 35 with EtJ ^lu-d ^ and Eto+s+i ^hs-d + 
for any ^ < b < D — d then we can obtain another assignments 

= and x\^ = x\^ where Ed=o^d,^.-d = ^ and 

Zld=6'+5+l ^d,ts-d = 0- 

Proof: We prove it by constructing and from 
and x^^ . We change the assignments , < d < D — 6 and 
t^, 6> < (i < I) to obtain x^^ and x^^. We now determine (5. 
Note that all the workloads released between (including) time 
slots — I) to tr can be executed at time U without violating 
deadline since tr — D < ts — D < tr — S < tr. Also all 
the workloads released between (including) time slots ts — D 
to tr can be executed at time ts without violating deadline 
since ts — D < tr — S < tr < ts. Hence the new assignment 
of workloads cannot violate any deadline. We determine S 



at a point where ^ 

E: 



D-e 

d=6-\-l Xd,tr--d 
5-1 e 

xS 



d=9+5+iXd,ts-d and ELo2;d,t.-d = and 



D- 



Ed=/+i x%t.r-d such that xf^ = x^^. Similarly 



E, 

for x^ , we have the new assignment as: Y^^d^Q ^ x\ 



6'+(5-l 



and xg^^^^^_^_^ — Xld= 



''d,ts-d 



and Ed= 

d- 



d,ts-d 



d=6>+5+l ^d,ts-d — 



^d,ts 



such that 



x^^ = xi^. m 

According to lemma [T] we do not need both t and d as 
indices of x. We can use the release time t to determine the 
deadline t -\- D. Thus, we drop the index d of x. At time t, 
unassigned workload from Lf-D to Lt is executed according 
to EDF policy while minimizing the objective function. To 
formulate the constraint that no assignment violates any dead- 
line we define delayed workload It with maximum deadline 
D. 

fo if t<D, 
Lt-D otherwise. 
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Fig. 1. Illustration of (a) offline optimal solution and (b) VFW(5) for arbitrary 
workload generated randomly; time slot length = 2 min, D = 15, 6 = 10. 



We call the delayed curve It for the workload as deadline 
curve. Thus we have two fundamental constraints on the 
assignment of workload for all t: 
(CI) Deadline Constraint: h ^ ^3 

(C2) Release Constraint: Yl^j=i xj < Yl]=i Lj 

Condition (CI) says that all the workloads assigned up to 
time t cannot violate deadline and Condition (C2) says that 
the assigned workload up to time t cannot be greater than the 
total released workload up to time t. Using these constraints 
we reformulate the optimization ([2]) as follows: 

T T 

min^^,^^ YmtC{xt/mt) + |mt - mt-i\ (3) 

\/t 



t=i 
t t 



subject to ^ lj < ^ Xj < ^ Lj 

j=i j=i j=i 

T T 

< Xt < rut 
<mt <M 



Vt. 



Since the operating cost function C{.) is an affine function, 
the objective function is linear as well as the constraints. Hence 
it is clear that the optimization ([3]) is a linear program. Note 
that capacity mt in this formulation is not constrained to be 
an integer. This is acceptable because data centers consists of 
thousands of active servers and we can round the resulting 
solution with minimal increase in cost. Figure [TJa) illustrates 
the offline optimal solutions for Xt and rrit for a dynamic 
workload generated randomly. The performance of the optimal 
offline algorithm on two realistic workload are provided in 
Section V. 

III. Valley Filling with Workload 

In this section we consider the online case, where at any 
time t, we do not have information about the future workload 
Lt' for t' > t. At each time t, we determine the Xt and mt 
by applying optimization over the already released unassigned 
workload which has deadline in future D slots. Note that the 
workload released at or before t, can not be delayed to be 
assigned after time slot t^D. Hence we do not optimize over 
more than 1^ + 1 slots. We simplify the online optimization 
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Fig. 2. The curves Lt and if and their intersection points. 



by solving only for rrit and determine Xt by making Xt = 
mt at time t. This makes the online algorithm not to waste 
any execution capacity that cannot be used later for executing 
workload. But the cost due to switching in the online algorithm 
may be higher than the offline algorithm. Thus our goal is to 
design strategies to reduce the switching cost. In the online 
algorithm, we reduce the switching cost by optimizing the 
total cost for the interval [t^t -\- D]. 

When the deadline is uniform, we can reduce the switching 
cost even more by looking beyond D slots. We do that 
by accumulating some workload from periods of high loads 
and execute that workload later in valleys without violating 
constraints (CI) and (C2). To determine the amount of accu- 
mulation and execution we use '^-delayed workload'. Thus 
the online algorithm namely Valley Filling with Workload 
(VFW((5)) looks ahead S slots to determine the amount of 
execution. Let if be the S-delayed curve with delay of S slots 
for < (5 < 



I 



if t < ^, 
Lt_s otherwise. 



Then we can call the deadline curve as D-delayed curve and 
represent it by ip . We determine the amount of accumulation 
and execution by controlling the set of feasible choices for 
mt in the optimization. For this we use the (5-delayed curve 
to restrict the amount of accumulation. By lower bounding 
mt for the valley (low workload) and upper bounding it for 
the high workload, we control the execution in the valley 
and accumulation in the other parts of the curve. In the 
online algorithm we have two types of optimizations: Local 
Optimization and Valley Optimization. Local Optimization is 
used to smooth the 'wrinkles' (small variation in the workload 
in adjacent slots e.g. see Figure |2]) within D consecutive slots 
and accumulate some workload and Valley Optimization fills 
the valleys with the accumulated workload. 

A. Local Optimization 

The local optimization applies optimization over future D 
slots and finds the optimum capacity for current slot by 
executing not more than (5-delayed workload. Let t be the 
current time slot. At this slot we apply a slightly modified 
version of offline optimization ^ in the interval [t^t -\- D]. 
Then we apply the following optimization LOPT(/t, if, rrit-i, 



M) to determine mt in order to smooth the wrinkles by 
optimizing over D consecutive slots. We restrict the amount 
of execution to be no more than the ^-delayed workload while 
satisfying the deadline constraint (CI). 



mm^ 



t+D t+D 

(eo + ei) X^^j (4) 
j^t j=t 
t t 
subject to ^ if < ^ mj 

t+D t+5 

0<m/e<M t<k<t^D 



After solving the local optimization, we get the value of 
mt for the current time slot and assign Xt = mt. For the 
next time slot t + 1 we solve the local optimization again to 
find the values for Xt-\-i and m^+i. Note that the deadline 
constraint (CI) and the release constraint (C2) are satisfied at 
time t, since from the formulation Xl^^i if < Yl^j^i ^ 



.1 S - 



1 ^j- 



B. Valley Optimization 

In valley optimization, the accumulated workload from the 
local optimization is executed in 'global valleys'. Before 
giving the formulation for the valley optimization we need 
to detect a valley. 

Let PijP2j ' ' ' jPn be the sequence of intersection points of 
Lt and if curves (see Figure [2| in nondecreasing order of their 
x-coordinates (t values). Let p[,P2^ . . . be the sequence of 
points on if with delay 6 added with each intersection point 
Pi,P2, • • • on if such that t'^ = + J for all 1 < 5 < n. 
We discard all the intersection points (if any) between Ps and 
p'g from the sequence such that tg+i > t^. Note that at each 
intersection point Ps, the curve from Ps to p^ is known. To 
determine whether the curve if between ps and p^ is a valley, 
we calculate the area 



If A is negative, then we regard the curve between Ps and 
Pg as a global valley though it may contain several peaks and 
valleys. If the curve between ps and p^^ is a global valley, we 
fill the valley with some (possibly all) of the accumulated 
workload by executing more than the ^-delayed workload 
while satisfying the release constraint (C2). For each t, we 
apply the following optimization VOPT(/^, Lt, mt-i, M) in 
the interval [t, t+I)] to find the value of mt where tg <t<tg. 
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t-\-D t+D 

min^^ (eo + ei) ^ m^- +/3 ^ |m^- - m^_i| (5) 

j^t 3=t 
t t 

subject to ^ if^ < ^ ruj 

t+D t 

0<m/e<M t<k<t^D 

Note that the deadhne constraint (CI) and the release 
constraint (C2) are satisfied at time t, since Ylj=i ^ 
Ylj=i ^3 ^ ^3' ^PPly the valley optimization pi 

for each ts < t < t'^ and local optimization ^ for each 
time slot t where t e {[1,T - - 1] - [ts^ts]} all t^- 
For each t G [T — I), T] we apply the valley optimization 
^ for global valley in the interval [t, T] in order to execute 
all the accumulated workload. Algorithm [T] summarizes the 
procedures for VFW(^). For each new time slot t. Algorithm [T] 
detects a valley by checking whether the curves and Lt 
intersects. If t is inside a valley, Algorithm [T] applies valley 
optimization (VOPT); local optimization (LOPT), otherwise. 
Figure [TJb) illustrates the nature of solutions from Y¥W(S) for 
Xt and rrit. Note that (5 is a parameter for the online algorithm 
VFW((5). 



Algorithm 1 Y¥W(S) 

1: valley ^ 0; mo ^ 

2: l^[l:D]^0; l^[l:5]^0 

3: for each new time slot t do 

5: l^[t^S]^L[t] 

6: if valley = and intersects L then 

7: Calculate Area A 

8: if A < then 

9: valley ^ 1 

10: end if 

11: else if valley > and valley < 5 then 

12: valley ^ valley + 1 

13: else 

14: valley ^ 

15: end if 

16: if valley = then 

17: m[t : t^D] ^ L0PT(/[1 : t], l^[l : t + mt_i,M) 

18: else 

19: m[t : t + ^ V0PT(/[1 : t], L[l : t], rut-u M) 

20: end if 

21: Xt ^ rut 

22: end for 



C. Analysis of the Algorithm 

We first prove the feasibility of the solutions from the 
VFW(J) algorithm and then analyze the competitive ratio of 



this algorithm with respect to the offline formulation ([3]). First, 
we have the following theorem about the feasibility. 

Theorem 1: The VFW(J) algorithm gives feasible solution 
for any < 5 < D. 

Proof: We prove this theorem inductively by showing that 
the choice of any feasible rrit from an optimization applied 
in the interval [t^t -\- D] do not result in infeasibility in 
the optimization applied in [t -\- l^t -\- D -\- 1]. Initially, the 
optimization in Y¥W(6) is applied for the interval [1, 1) + 1] 
with X]j=i = for 1 < /c < I). Hence the optimization 
applied in the intervals [1,1^ + 1] gives feasible mi because 
E ■=! If < EU ^ for 1 < < D. 

Now suppose the YFW(S) gives feasible rrit in an interval 
[t^t -\- D]. We have to prove that there exists feasible choice 
for mt for the optimization applied at [t -\- l^t -\- D -\- 1]. The 
deadline constraint (CI) and the release constraint (C2) are 
satisfied for rrit. Hence, Yfj=i^f ^ ^]=i^j - Ylj=i^3- 
Since < S < D, EU^? ^ T.]t\lf < EU^3 ^ 
Zljti^j - ^]^i^3 ^ Y^jtX^j- Thus for any feasible 
choice of m^, we can always obtain feasible solution for m^+i 
such that the above inequality holds. ■ 

We now analyze the competitive ratio of the online 
algorithm with respect to the offline formulation ([3]). 
We denote the operating cost of the solution vectors 
X — (xi, X2, . . . , and M = (mi, m2, . . . , mr) by 
costo{X,M) = E^^-^mtC{xt/mt), switching cost by 
costs{X^M) = f^Ej^i\mt — mt-i\ and total cost by 
costlx.M) = costo{X,M) + costs{X,M). We have the 
following lemma. 

Lemma 2: costs{X, M) < 2j3 Xl^i 

Proof: Switching cost at time t St = P\mt — mt-i \ < 
j3{mt + mt-i), since rut > 0. Then costs{X^M) < 
P Y^J=ii^t + rnt-i) < 2/3 Y^J^i mt where mo = 0. ■ 

Let X* and M* be the offline solution vectors from 
optimization ([3]). We have the following theorem about the 
competitive ratio. 

Theorem 2: cost{X,M) < ^°+!^+^^ cogt(X*, M*). 
Proof: Since the offline optimization assigns all the work- 
load in the [1, T] interval, ^f^^ = Y^J^^ Lt < Y^^^i ' 
where we used x^ < for aU t. Hence cost(X*,M*) > 
costo{X\M^) = Ef=i<C'(x*/m,*) = Eti(eo< + 
ei^t*) > Er=i(eo + ei)Lt. 

In the online algorithm we set Xt = mt and Ylj^i ^ 
Y^^j^iLj for all t e Hence by lemma [2j we have 

cost{X,M) = costo{X,M) ^ costs{X,M) < ELiI^g + 
ei)mt + 2/3 < (eo + ei) ELi + 2/3 Y.J=i ^t = 

(eo + ei + 2/3)ELL,. ■ 

Note that the competitive ratio does not depend on S or D. 
Hence the performance of the VFW(^) is within a constant 
factor of the offline algorithm. Although the ratio seems to be 
large, the performance of VFW(J) algorithm is close to the 
offline optimal algorithm as evaluated in section V. 



IV. Generalized Capacity Provisioning 

We now consider the general case where the deadUne 
requirement is not same for all the workload. Let v be the 
maximum possible deadline. We decompose the workload 
according to their associated deadline. Suppose L^^t > be 
the portion of the workload released at time t and has deadline 

for < < z/. We have 



d=0 



The workload to be executed at any time slot t can come 
from different previous slots t — d where < d < u sls 
illustrated in Figure [3ja). Hence we redefine the deadline curve 
It and represent it by l[. Assuming Ld,t = if t < 0, we define 



(t-d) 



d=0 



Then the offline formulation remains the same as formula- 
tion ^ with the deadline curve It replaced by l[. 

T T 

^mtC{xt/mt) ^ (3^\mt- mt-i\ (6) 



t=i 
t t 



subject to ^ Ij < ^ Xj < ^ Lj 

j=i j=i j^i 

T T 

< Xt < rut 
0<mt<M 



Vt. 



We now consider the online case. Delaying the workload 
up to their maximum deadline may increase the switching cost 
since it may increase the variation in the workload compared to 
the original workload (see Figure (Sjb)). Hence at each time we 
need to determine the optimum assignment and capacity that 
reduces the switching cost from the original workload while 
satisfying each individual deadline. We can apply the VFW(J) 
algorithm from the previous section with D = Dmin where 
Drain IS the minimum deadline for the workload. If Dmin 
is small, VFW((5) does not work well because 6 < Dmin 
becomes too small to detect a valley. Hence we use a novel 
approach for distributing the workload Lt over the Dt slots 
such that the change in the capacity between adjacent time 
slots is minimal (see Figure [3jc)). We call this algorithm 
Generalized Capacity Provisioning (GCP) algorithm. 

In the GCP algorithm, we apply optimization to determine 
rrit at each time slot t and make Xt = rrit. The optimization 
is applied over the interval [t^t -\- v] since at time slot t we 
can have workload that has deadline up to t -\- u slots. Hence 
at each time t, the released workload is a vector ofu-\-l 
dimension. Let, Lt = (Lo,^, Li,^, . . . , L^y^t) where L^^t — 
if there is no workload with deadline d at time t. Let yt 
be the vector of unassigned workload released up to time 
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Fig. 3. Illustration of workload with different deadline requirements, (a) 
workload released at different times have different deadlines, (b) the delayed 
workload /J, may increase the switching cost due to large variation, (c) 
distribution of workload in adjacent slots by GCP to reduce the variation 
in workload. 



t. The vector yt is updated from yt-i at each time slot 
by subtracting the capacity mt-\ and then adding Lt. Note 
that mt-\ is subtracted from the vector yt-i in order to use 
unused capacity to execute already released workload at time 
t — 1 by following EDF policy (see lines 4-17 in Algorithm [2]). 
Let y;_i = (^0,^-1, yl,^-i,^2,t-i.---.<^-i) be the vector 
after subtracting mt-\ with ^o,t-i — ^ and v'^^t-\ — ^ 
I < j < V. Then yt = (yl,t-i, ^2,t-i, • • • , ^^.,^-1, 0) + Lt 
where yt = (0, 0, . . . , 0) if t <= 0. Then the optimization 
GCP-OPT(yt, rrit-i, M) applied at each t over the interval 
[t^t ^v] is as follows: 



mmn 



j=t 3=t 



subject to ^ mt+j = ^ yj^t 

3 3 

^ mtj,k > ^ Vk^t 
< mt^j < M 



(7a) 
(7b) 



0<j<u-l (7c) 
0<j<u (Id) 



Note that the optimization ^ solves for + 1 values. 
We only use rrit as the capacity and assignment of workload 
at time t. Algorithm |2] summarizes the procedures for GCP. 
The GCP algorithm gives feasible solutions because it works 
with the unassigned workload and constraint (7c) ensures 
deadline constraint (CI) and constraint (7b) ensures the release 
constraint (C2). The competitive ratio for the GCP algorithm 
is same as the competitive ratio for VFW((5) because in GCP, 
rrit = Xt and release constraint (C2) holds at every t making 
ELi = ELi Xt < ELi Lt. 

V. Experimental Results 

In this section, we seek to evaluate the cost incurred by the 
Y¥W(S) and GCP algorithm relative to optimal solution in the 
context of workload generated from realistic data. 

A. Experimental Setup 

We aim to use realistic parameters in the experimental setup 
and provide conservative estimates of cost savings resulting 
from our proposed VFW((5) and GCP algorithms. 



Algorithm 2 GCP 



y[0 :iy]^0 
mo ^ 

for each new time slot t do 

uc ^ rut-i {uc represents the unused capacity} 
for i = to do 
if ^/c < then 

y'[i] ^ y[i] 
else 

uc uc — y[i] 
if iic < then 

y'[i] i uc 

else 

end if 
end if 
end for 

^[0:^]={^qi:^],0} + L,[0:z.] 
m[t:t^D] ^ GCP-OPT(y[0 : i/], mt_i, M) 
xt ^ mt 
end for 
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C(9^^ benchmark: Currently data centers typically do not use 
dynamic capacity provisioning based on the variation of the 
workload |7|. A naive approach for capacity provisioning is 
to follow the workload curve and determine the capacity and 
assignment of workload accordingly. Clearly it is not a good 
approach because for capacity provisioning it does not take 
into account the cost incurred due to switching. Yet this is a 
very conservative estimate as it does not waste any execution 
capacity and meets all the deadline. We compare the total 
cost from the VFW((5) and GCP algorithm with the 'follow 
the workload' (x = m = L) strategy and evaluate the cost 
reduction. 

Cost function parameters: The total cost is characterized by 
eo and Ci for the operating cost and j3 for the switching cost. 
In the operating cost, eo represents the proportion of the fixed 
cost and ei represents the load dependent energy consumption. 
The energy consumption of the current servers is dominated by 
the fixed cost |10|. Therefore we choose eo = 1 and ei = 0. 
The switching cost parameter {3 represents the wear-and-tear 
due to changing power states in the servers. We choose /3 = 6 
for slot length of 10 minutes such that it works as an estimate 
of the time a server should be powered down (typically one 
hour ||7l, |[9l) to outweigh the switching cost with respect to 
the operating cost. 

Workload description: We use two HTTP traces from real 
world as examples of dynamic workload. The HTTP traces 
are taken from the HTTP request logs for one day (24 hours) 
from a server at University of California San Diego (UCSD) 
and San Diego Super Computing Center (SDSC). We counted 
the number of different types of requests over a time slot length 
of 10 minutes and use that as a dynamic workload (Figure]?]). 
The two examples we use represent strong diurnal properties 
and have variation from bursty workload (UCSD) to typical 




time (hours) 

(a) UCSD 



time (hours) 
(b) SDSC 



Fig. 4. Illustration of the traces for dynamic workload used in the 
experiments. 



workload (SDSC). We then assign deadline for each workload. 
For VFW(5), the deadline D is uniform and is assigned in 
terms of number of slots the workload can be delayed. For 
our experiments, We vary D from 1 — 15 slots which gives 
latency from 10 minutes upto 2 hour 30 minutes. Note that 
the choice for the values of D are intended for the synthetic 
dynamic workload and not for real http requests. For GCP, 
we use the request types to generate workload with different 
deadline requirements. For same type of requests we chose 
the same value for d but assigned different value for different 
types e.g. image files, video files, text files etc. We picked ten 
different file types from the requests and used their relative 
frequency to assign deadline varying from 1 — 10 slots. 

B. Experimental Analysis 

In this section we analyze the impact of wide variety of 
parameters on cost savings provided by VFW(J) and GCP. We 
then compare VFW((^) and GCP for uniform deadline (GCP- 
U) and justify practical significance of our work. 

Impact of deadline: The first parameter we study is the 
impact of different deadline requirements of the workload on 
the cost savings. Figure [5] shows that even for deadline D 
as small as 2 slots, the cost is reduced by ^40% for GCP- 
U, ~30% for VFW(J) while the offline algorithm gives a 
cost saving of ~60% compared to the naive algorithm. It 
also shows that for all the algorithms, large D gives more 
cost savings as more workload can be delayed to reduce 
the variation in the workload. As D grows larger the cost 
reduction from GCP-U and VFW((5) approaches offline cost 
saving which is as much as 70%. For VFW(J), the cost saving 
is always less than GCP-U for typical workload (SDSC), but 
for bursty workload (UCSD) VFW((5) performs better than 
GCP-U as filling valleys with the workload becomes more 
beneficial when D becomes large. 

Impact of S for VFW(S): The parameter S is used as a 
lookahead to detect a valley in the YFW(S) algorithm. If 6 is 
large, valley detection performs well but it may be too late to 
fill the valley due to the deadlines. On the other hand if S is 
small, valley detection does not work well because the capacity 
has already gone down to the lowest value. Figure [6] illustrates 
the valley detection for small S and large S. Although the 
cost savings from VFW(J) largely depends the nature of the 
workload curve. Figure [t] shows that 5 ^ D/2is3. conservative 
estimate for better cost savings. 
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6. Valley detection for (a) small 5 and (b) large 5 for VFW((5). 
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Fig. 7. Impact of 5 for VFW(5) with deadhne D = 15. 



Performance of GCP: We evaluated the cost savings from 
GCP by assigning different deadline for different types of 
workload. For conservative estimates of deadline requirements, 
we found ~60% cost reduction for UCSD workload and 
~40% cost reduction for SDSC workload each of which 
remains close to the offline optimal solutions. 

Comparision of VFW(8) and GCP: We compare GCP 
for uniform deadline (GCP-U) with VFW(J) fox 8 = D/2. 
Figure [8] illustrates the cost reduction for VFW(^) and GCP-U 
with uniform deadline D = lb and (5 = 8. Surprisingly for 
bursty workload (UCSD), VFW((5) works much better than 
GCP-U for any value of 6. But for typical workload GCP- 
U works better. Hence the performance of both the online 
algorithms depends largely on the nature of the workload. 

Practical significance: We now justify different practical as- 
pects of capacity provisioning via dynamic deferral. Although 
we have used synthesized workload for our experiments, there 
are many real world examples in High Performance Com- 
puting (HPC) which has real deadline requirements e.g. see 
ifm . |[T2ll . Lee et al. |11| conducted a survey to measure the 
impact of delay on user satisfaction and represented delay as 
utility functions which are often flat indicating fixed deadline 
requirements. The deadline requirements in HPC are usually 





Offline VFW GCP 
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Fig. 8. Comparision of VFW(5), GCP-U and Offline algorithms with uniform 
deadline D = lb and 5 = 8. 



large in orders of hours but in other applications it might be 
small in order of minutes. Hence we studied the impact of 
different deadline requirements varying from 10 minutes to 
2 hours and 30 minutes (see Figure [5]). Our results highlight 
that even if the deadline is as small as one slot (10 minutes), 
we save around 40% energy consumption compared to without 
using dynamic deferral. Note that we do not use any prediction 
window |3, fT3l for the algorithms. Using prediction window, 
our algorithms can be modified to optimize over more future 
slots which eventually results in more energy savings. 

VI. Related Work 

With the importance of energy management in data centers, 
many scholars have applied energy-aware scheduling because 
of its low cost and practical applicability. In energy-aware 
scheduling, most work tries to find a balance between energy 
cost and performance loss through DVFS (Dynamic Voltage 
and Frequency Scaling) and DPM (Dynamic Power Man- 
agement), which are the most common system-level power 
saving methods. Beloglazov et al. |[T4l give the taxonomy 
and survey on energy management in data centers. Dynamic 
capacity provisioning is part of DPM technique. Chase et al. 
1 15] introduce the executable utility functions to quantify the 
value of performance and use economic approach to achieve 
resource provisioning. Pinheiro et al. |16| consider resource 
provisioning in both application and operating system level. 
They dynamically turn on or turn off nodes to adapt to the 
changing load, but do not consider the switching cost. 

Most work on dynamic capacity provisioning for indepen- 
dent workload uses models based on queueing theory |17|, 
1 18], or control theory |20|, EB, 1^. Recently Lin 

et al. (3 used more general and common energy model and 
delay model which are not confined to queueing or control 
theoretic methods. In our paper we also use general energy 
model to extend its usage for latency requirements. However, 
our work is different from Lin et al. |7 1 in several ways. First, 
the performance of their LCP algorithm depends on the peak- 
to-mean ratio (PMR) of the workload, while our algorithms 
perform better for workloads with high PMR. Second, in LCP 
algorithm, smaller time slot is desirable because switch cost 
depends on the length of time slot, and thus the difference 
between upper and lower limit for the capacity increases for 
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LCP algorithm and its capacity curve comes closer to the 
workload curve resulting in higher switch cost than ours. At 
last, in concerning delay, LCP considers delay as the objective 
and aim to minimize the average delay while we regard it 
as the deadline constraint. Instead of penalizing the delay, 
we provide guarantee on maximum delay and utilize delay 
to reduce the switching cost of the servers. 

Many applications in real world require delay bound or 
deadline constraint e.g. see Lee et al. |11|. When combining 
with energy conservation, deadline is usually a critical adjust- 
ing tool between performance loss and energy consumption. 
Energy efficient deadline scheduling was first studied by Yao 
et al. |[23i . They proposed one offline algorithm and two 
online algorithms, which aim to minimize energy consumption 
for independent jobs with deadline constraints on a single 
variable-speed processor. After that, a series of work was done 
to consider online deadline scheduling in different scenar- 
ios, such as discrete-voltage processor, tree-structured tasks, 
processor with sleep state and overloaded system 1241 . ll25ll . 
There are also some work on energy-aware scheduling in 
multiprocessor systems. Most of them focus on real-time tasks 
EH, 1271 . 12811 . In the context of data center, most work on 
energy management merely talk about minimizing the average 
delay but not give any bound on delay except Mukherjee 
et al. |6|. They propose an offline algorithm-SCINT and 
online algorithm-EDF-LRH considering deadline constraints 
to minimize the computation, cooling and migration energy. 
In the online algorithm, they simply use EDF algorithm to 
satisfy the deadline, while in the offline algorithm they use 
genetic algorithm. However, this work is a job assignment 
problem not a dynamic resource provisioning problem, where 
the number of needed servers is given in advance. 

VII. Conclusion 

In this paper we have proposed two new algorithms Y¥W(S) 
and GCP for capacity provisioning in data centers while 
guaranteeing the deadlines. The algorithms utilize the latency 
requirements of workloads for cost savings and guarantees 
bounded cost and bounded latency under very general settings 
- arbitrary workload, general deadline and general energy cost 
models. Further both the VFW(J) and GCP algorithms are 
simple to implement and do not require significant computa- 
tional overhead. 

Our experiments highlight that significant cost and energy 
savings can be achieved via dynamic deferral of workload. 
In this paper, we tried to limit our motivation towards data 
centers. But in the future 'cloudy world' where almost all the 
computation will be outsourced, deadline/latency requirements 
would catch more attention. Therefore it would be worth and 
interesting to apply the concept of dynamic defferal to other 
load balancing or scheduling problems. We keep that as our 
future work. 
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